68 research outputs found

    Improving OpenStack Swift interaction with the I/O stack to enable software defined storage

    Get PDF
    This paper analyses how OpenStack Swift, a distributed object storage service for a globally used middleware, interacts with the I/O subsystem through the Operating System. This interaction, which seems organised and clean on the middleware side, becomes disordered on the device side when using mechanical disk drives, due to the way threads are used internally to request data. We will show that only modifying the Swift threading model we achieve an 18% mean improvement in performance with objects larger than 512 KiB and obtain a similar performance with smaller objects. Compared to the original scenario, the performance obtained on both scenarios is obtained in a fair way: the bandwidth is shared equally between concurrently accessed objects. Moreover, this threading model allows us to apply techniques for Software Defined Storage (SDS). We show an implementation of a Bandwidth Differentiation technique that can control each data stream and that guarantees a high utilization of the device.The research leading to these results has received funding from the European Community under the IOStack (H2020-ICT-2014-7-1) project, by the Spanish Ministry of Economy and Competitiveness under the TIN2015-65316-P grant and by the Catalan Government under the 2014-SGR-1051 grant. To learn more about the IOStack H2020 project, please visit http:nnwww.iostack.eu.Peer ReviewedPostprint (author's final draft

    ECHOFS: a scheduler-guided temporary filesystem to leverage node-local NVMS

    Get PDF
    © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.The growth in data-intensive scientific applications poses strong demands on the HPC storage subsystem, as data needs to be copied from compute nodes to I/O nodes and vice versa for jobs to run. The emerging trend of adding denser, NVM-based burst buffers to compute nodes, however, offers the possibility of using these resources to build temporary file systems with specific I/O optimizations for a batch job. In this work, we present echofs, a temporary filesystem that coordinates with the job scheduler to preload a job's input files into node-local burst buffers. We present the results measured with NVM emulation, and different FS backends with DAX/FUSE on a local node, to show the benefits of our proposal and such coordination.This work was partially supported by the Spanish Ministry of Science and Innovation under the TIN2015–65316 grant, the Generalitat de Catalunya under contract 2014– SGR–1051, as well as the European Union’s Horizon 2020 Research and Innovation Programme, under Grant Agreement no. 671951 (NEXTGenIO). Source code available at https://github.com/bsc-ssrg/echofs.Peer ReviewedPostprint (author's final draft

    DYON: Managing a new scheduling class to improve system performance in multicore systems

    Get PDF
    Best Paper Award ROME 2013Due to the increase in the number of available cores in current systems, plenty of system software starts to use some of these cores to perform tasks that will help optimize the application behaviour. Unfortunately, current Onload mechanisms are too limited. On the one hand, there is no dynamic way to decide the number of cores that is taken from applications and given to these system helpers. And, on the other hand, the onload mechanisms do not offer enough control over when and where onloading tasks should to be executed. In this paper we propose a new Onload Framework that addresses these issues. First, we propose DYON, a dynamic and adaptive method to control the amount of extra CPUs offered to the Onload Framework to generate benefits for the whole system. And second, we propose a submission mechanism that given a task, executes it if there are idle resources or rejects it otherwise. This feature is useful to move the execution of small pieces of code out of the critical path (allowing parallel execution) when this is possible, or discard them and execute a code that will not rely on them.Award-winningPostprint (author’s final draft

    YOLO: AccĂ©lĂ©ration du temps de dĂ©marrage de la machine virtuelleen rĂ©duisant les opĂ©rations d’I/O

    Get PDF
    Several works have shown that the time to boot one virtual machine (VM) can last up to a fewminutes in high consolidated cloud scenarios. This time is critical as VM boot duration defines how anapplication can react w.r.t. demands’ fluctuations (horizontal elasticity). To limit as much as possible thetime to boot a VM, we design the YOLO mechanism (You Only Load Once). YOLO optimizes the numberof I/O operations generated during a VM boot process by relying on the boot image abstraction, a subsetof the VM image (VMI) that contains data blocks necessary to complete the boot operation. Whenevera VM is booted, YOLO intercepts all read accesses and serves them directly from the boot image, whichhas been locally stored on fast access storage devices (e.g., memory, SSD, etc.). Creating boot imagesfor 900+ VMIs from Google Cloud shows that only 40 GB is needed to store all the mandatory data.Experiments show that YOLO can speed up VM boot duration 2-13 times under different resourcescontention with a negligible overhead on the I/O path. Finally, we underline that although YOLO hasbeen validated with a KVM environment, it does not require any modification on the hypervisor, theguest kernel nor the VM image (VMI) structure and can be used for several kinds of VMIs (in this study,Linux and Windows VMIs have been tested)Plusieurs travaux ont montrĂ© que le temps de dĂ©marrage d’une machine virtuelle (VM)peut s’étale sur plusieurs minutes dans des scĂ©narios fortement consolidĂ©s. Ce dĂ©lai est critique car ladurĂ©e de dĂ©marrage d’une VM dĂ©finit la rĂ©activitĂ© d’une application en fonction des fluctuations decharge (Ă©lasticitĂ© horizontale). Pour limiter au maximum le temps de dĂ©marrage d’une VM, nous avonsconçu le mĂ©canisme YOLO (You Only Load Once). YOLO optimise le nombre d’opĂ©rations “disque”gĂ©nĂ©rĂ©es pendant le processus de dĂ©marrage. Pour ce faire, il s’appuie sur une nouvelle abstractionintitulĂ©e “image de dĂ©marrage” et correspondant Ă  un sous-ensemble des donnĂ©es de l’image de la VM.Chaque fois qu’une machine virtuelle est dĂ©marrĂ©e, YOLO intercepte l’ensemble des accĂšs en lectureafin de les satisfaire directement Ă  partir de l’image de dĂ©marrage, qui a Ă©tĂ© stockĂ©e prĂ©alablement surdes pĂ©riphĂ©riques de stockage Ă  accĂšs rapide (par exemple, mĂ©moire, SSD, etc.). La crĂ©ation d’imagede dĂ©marrage pour les 900 types des VMs proposĂ©es sur l’infrastructure Cloud de Google reprĂ©senteseulement 40 Go, ce qui est une quantitĂ© de donnĂ©es qui peut tout Ă  fait ĂȘtre stockĂ©e sur chacundes noeuds de calculs. Les expĂ©riences rĂ©alisĂ©es montrent que YOLO permet accĂ©lĂ©rer la durĂ©e dedĂ©marrage d’un facteur allant de 2 Ă  13 selon les diffĂ©rents scĂ©narios de consolidation. Nous soulignonsque bien que YOLO ait Ă©tĂ© validĂ© avec un environnement KVM, il ne nĂ©cessite aucune modificatfionsur l’hyperviseur, le noyau invitĂ© ou la structure d’image de la VM et peut donc ĂȘtre utilisĂ© pourplusieurs types d’images (dans cette Ă©tude, nous testons des images Linux et Windows)

    Freezing time emulating new and faster devices with virtual machines

    Get PDF
    Recent proposals of emerging data storage devices make it necessary to reevaluate all levels of the storage hierarchy to optimize the software stack performance. However, these new devices are not always widely available and therefore early experiments may be impossible. Emulators aim at mimicking as close as possible the behavior of a component, nonetheless, emulating new and fast storage devices is a challenging task due to time perception. In this work, we propose an approach to emulate storage devices using virtual machines (VMs) allowing the evaluation of a new device within a real system. We use a technique called freezing time, which pauses a VM to manipulate its clock and hide the real I/O completion time. Our approach is implemented at the hypervisor level and it is transparent to the guest operating system or application. We evaluate the technique under a real system using regular magnetic disks to emulate faster storage devices. Our method presented a latency error of 6.5% compared to a real device. Moreover, decoupled experiment between two laboratories, at the Barcelona Super Computing Center (BSC) in Spain, and the Center of Computer Science and Free Software (C3SL) in Brazil, demonstrated that our approach is reproducible and promising to allow the virtual evaluation of next-gen storage devices.This work was partially supported by the Spanish Ministry of Science and Innovation under the TIN2015-65316 Grant, the Generalitat de Catalunya under contract 2014-SGR-1051, the Serrapilheira Institute (Grant number Serra-1709-16621), as well as the European Union’s Horizon 2020 Research and Innovation Programme, under Grant Agreement no. 671951 (NEXTGenIO) for the extensions added after the MASCOTS paper.Peer ReviewedPostprint (author's final draft

    Freezing Time: a new approach for emulating fast storage devices using VM

    Get PDF
    © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Recently we are seeing a considerable effort from both academy and industry in proposing new technologies for storage devices. Often these devices are not readily available for evaluation and methods to allow performing their tests just from their performance parameters are an important tool for system administrators. Simulators are a traditional approach for carrying out such evaluations, however, they are more suitable for evaluating the storage device as an isolate component, mostly due to time constraints. In this paper, we propose an approach based on virtual machine technology that is capable of emulate storage devices transparently for the operating system allowing evaluation of simulating devices within a real system using any synthetic or real workload. To emulate devices in real environments it is necessary to use the currently available devices as a storage medium which creates a difficulty when the device to be emulated is faster than this storage medium. To circumvent this limitation we introduce a new technique called Freezing Time, which takes advantage of virtual machine pausing mechanism to manipulate the virtual machine clock and hide the real I/O completion time. Our approach can be implemented just requiring the hypervisor to be modified, providing a high degree of compatibility and flexibility since it is not necessary to modify neither the operating system nor the application. We evaluate our tool under a real system using old magnetic disks to emulate faster storage devices. Experiments using our technique presented an average latency error of 6.08% for read operations and 6.78% for write operations when comparing a real to device.This work was partially supported by the Spanish Ministry of Science and Innovation under the TIN2015–65316 grant, the Generalitat de Catalunya under contract 2014–SGR–1051.Peer ReviewedPostprint (author's final draft

    Arbitration policies for on-demand user-level I/O forwarding on HPC platforms

    Get PDF
    I/O forwarding is a well-established and widely-adopted technique in HPC to reduce contention in the access to storage servers and transparently improve I/O performance. Rather than having applications directly accessing the shared parallel file system, the forwarding technique defines a set of I/O nodes responsible for receiving application requests and forwarding them to the file system, thus reshaping the flow of requests. The typical approach is to statically assign I/O nodes to applications depending on the number of compute nodes they use, which is not always necessarily related to their I/O requirements. Thus, this approach leads to inefficient usage of these resources. This paper investigates arbitration policies based on the applications I/O demands, represented by their access patterns. We propose a policy based on the Multiple-Choice Knapsack problem that seeks to maximize global bandwidth by giving more I/O nodes to applications that will benefit the most. Furthermore, we propose a user-level I/O forwarding solution as an on-demand service capable of applying different allocation policies at runtime for machines where this layer is not present. We demonstrate our approach's applicability through extensive experimentation and show it can transparently improve global I/O bandwidth by up to 85% in a live setup compared to the default static policy.This study was financed by the Coordenação de Aperfeiçoamento de Pessoal de Nível Supenor - Brasil (CAPES) - Finance Code 001. It has also received support from the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Brazil. It is also partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under grants PID2019-107255GB; and the Generalitat de Catalunya under contract 2014-SGR-1051. The authors thankfully acknowledge the computer resources, technical expertise and assistance provided by the Barcelona Supercomputing Center. Experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000.fr).Peer ReviewedPostprint (author's final draft

    GekkoFS: A temporary distributed file system for HPC applications

    Get PDF
    We present GekkoFS, a temporary, highly-scalable burst buffer file system which has been specifically optimized for new access patterns of data-intensive High-Performance Computing (HPC) applications. The file system provides relaxed POSIX semantics, only offering features which are actually required by most (not all) applications. It is able to provide scalable I/O performance and reaches millions of metadata operations already for a small number of nodes, significantly outperforming the capabilities of general-purpose parallel file systems.The work has been funded by the German Research Foundation (DFG) through the ADA-FS project as part of the Priority Programme 1648. It is also supported by the Spanish Ministry of Science and Innovation (TIN2015–65316), the Generalitat de Catalunya (2014–SGR–1051), as well as the European Union’s Horizon 2020 Research and Innovation Programme (NEXTGenIO, 671951) and the European Comission’s BigStorage project (H2020-MSCA-ITN-2014-642963). This research was conducted using the supercomputer MOGON II and services offered by the Johannes Gutenberg University Mainz.Peer ReviewedPostprint (author's final draft

    Energy-aware scheduling in virtualized datacenters

    Get PDF
    The reduction of energy consumption in large-scale datacenters is being accomplished through an extensive use of virtualization, which enables the consolidation of multiple workloads in a smaller number of machines. Nevertheless, virtualization also incurs some additional overheads (e.g. virtual machine creation and migration) that can influence what is the best consolidated configuration, and thus, they must be taken into account. In this paper, we present a dynamic job scheduling policy for power-aware resource allocation in a virtualized datacenter. Our policy tries to consolidate workloads from separate machines into a smaller number of nodes, while fulfilling the amount of hardware resources needed to preserve the quality of service of each job. This allows turning off the spare servers, thus reducing the overall datacenter power consumption. As a novelty, this policy incorporates all the virtualization overheads in the decision process. In addition, our policy is prepared to consider other important parameters for a datacenter, such as reliability or dynamic SLA enforcement, in a synergistic way with power consumption. The introduced policy is evaluated comparing it against common policies in a simulated environment that accurately models HPC jobs execution in a virtualized datacenter including power consumption modeling and obtains a power consumption reduction of 15% with respect to typical policies.Peer ReviewedPostprint (published version
    • 

    corecore